In [ ]:
from __future__ import division, print_function

9. Python functions


Sometimes, a portion of code is reused over and over again in the entire script. To prevent repetitive coding, we are able to define our own custom defined functions using the def keyword. When invoked, functions will instruct the computer to perform a set list of instructions, possibly returning an output at the end. Functions can also be pre-defined and saved in another file (with extension .py) so that it can be used for another project.

You will have encountered many pre-defined functions in Python up to now. Functions like print, len, etc... But there are many other pre-built functions in Python which are extremely useful and makes code more transparent and readable.

We will also encounter what is known as anonymous functions which are defined using the lambda keyword. These are short functions and can only be defined in one line of code. While technically def does what lambda does, nevertheless lambda allows for less pendantic and more natural style of coding. You will encounter anonymous functions alot when working with pandas especially when cleaning up data frames.

Finally we introduce what is perhaps one of the more important concepts in Python programming: List comprehensions. These are syntatical shortcuts for the for loop which translate not only to better code style, but faster scripts.

9.1 Learning objectives

The objectives of this unit are:

  1. To use the def keyword to defined functions.

  2. Recall the terms arguments, keyword arguments, the signature of a function.

  3. To use lambda to define anonymous functions.

  4. To use build in functions: zip and enumerate.

  5. To learn how to refactor for loops into list comprehension statements.

9.2 Let's build our own functions

All functions are defined using the def keyword. In general, the format of function definition looks like this:

def my_function_name(arg_1, arg_2, ..., arg_n) : 

    code

    return <something, or None> 

First, we tell Python that we are going to define a function by typing out def. Then we proceed by naming our function. The normal rules for naming variables apply to naming functions as well - one cannot start a name with a number or use special characters or use words which have been reserved for Python.

Then after the name, we describe the signature of the function by writing down every argument to the function seperated by commas and enclosed in round braces ( ). And argument to a function is an input to the code which will be executed when the function is called. It is not mandatory that a function recieve inputs. Sometimes, a function just needs to run a set of instructions, without any input. We end the def statement with a :. The newline under this marks the beginning of the function code.

Every line of code meant for the function must be indented. There are no enclosing { } which marks the "body" of the function. In Python, the "body" of the function is denoted with indentation only. Thus, every line of code meant for the function must be on the same indentation level. Finally, at the end of the function we return an output or None. While this last syntax is not mandatory, it is not good practice to leave off a function definition without a return statement.


In [ ]:
# Our first function

def my_first_function():
    pass

For our first function, we see above that my_first_function does not take in any input and does nothing. The pass keyword is a kind of temporary placeholder and basically does nothing. We use pass because one cannot leave a function "body" without any code at all.

Now let's code something into my_first_function so that it does something useful.


In [ ]:
def my_first_function():
    print("Hello world!")

my_first_function will print the string "Hello world!" whenever it is called. Calling a function basically means instructing Python to run the code contained in the function. Notice that after defining a function and running the cell, there is no output. But that doesn't mean nothing has happened. In fact, Python has populated the global namespace with a new name, my_first_function and is ready to do what ever has been coded into this function when it is called.


In [ ]:
my_first_function

It is good to understand what happens when we type my_first_function and execute a cell. Notice that the output says <function ... This means that the variable my_first_function represents an object of type function. The rest of the output indicates that this function is represented by a name my_first_function in the module __main__. We will not describe what modules are in this course, but suffices for our purposes to think of __main__ as file containing all the functions that we will define in this Jupyter Notebook session.

To actually execute the instructions in my_first_function, we must type my_first_function().


In [ ]:
my_first_function()

9.3 Functions with arguments

Functions won't be useful if we are unable to pass input into it. Most of the time, the set of instructions will act on the input we have supplied to the function and produces some output which is then passed to a variable to be stored. Let's modify my_first_function to print out a name supplied as input to it.


In [ ]:
def my_first_function(name):
    print("Hello %s" % (name))
    return None

my_first_function("Tang U-Liang")

In [ ]:
# Passing two arguments

def special_product(x,y):
    prod = x-y+x*y
    return prod

When defining functions with arguments, the same variable name used in the signature must be used in the body of the function. Now there is nothing inherently special about using name to represent the argument for names to my_first_function. After all, the computer doesn't "understand" that we intend to print out a name when calling my_first_function. However, we should use recognizable variable names to improve readibility of our code and to make our intentions transparent.

In the function special_product, I passed two arguments named x and y. Inside the function, it performs the operation and assigns the result to a variable named prod. Then the function uses the keyword return to send the answer out from the function environment to the global environment.


In [ ]:
answer = special_product(1,3)
print(answer)

What happened is that the function special_product performs the said operation on inputs 1 and 3. It then outputs the answer, in this case 7. We assign the output 7 to a variable named answer and print it.

Note that we do not need to explicit declare a variable to "capture" the answer. The following works too.


In [ ]:
special_product(1,3)

Passing arguments in correct sequence matters. Python will pass values to arguments according to the sequence as it was declared in the signature.


In [ ]:
# x = 1, and y =3
print(special_product(1,3))

# x =3 and y = 1
print(special_product(3,1))

What will happen if we try to display the variable prod directly?


In [ ]:
print(prod)

9.3.1 Function scope

But isn't the variable prod defined already when we defined the function special_product? This happens because the variable prod is only available in the scope of the function. The global environment is another scope. In general, variables from one scope are not accessible in another scope with exeptions given by scoping rules (which are programming language dependant). For this course, it suffices to know that variables defined in the function scope will NOT be accessible from the global scope.

(This can be overriden using the global keyword. But this is not encouraged.)

9.4 Function defaults

When we define functions, all variables we define in the signature must be assigned values. We cannot leave any out.


In [ ]:
special_product(1,)

Therefore, it becomes quite a hassle if we have to call the function in various places in our code with the same input in one of the arguments. To do that we can assign default values to particular arguments in the following manner.

def my_function(arg_1, arg_2 = default_value, ...):

    code

Note that arguments assigned default values must come after arguments without default values. Also, don't worry that you cannot input values other than defaults. You are still able to override default values when you need to.


In [ ]:
def special_product(x, y=1): # default value of y is 1
    return x-y+x*y

# We don't have to pass any value to arguments with default values
print(special_product(2))

# Default values can be overriden
print(special_product(2,9))

9.5 Passing arguments to functions by keyword

The arguments to a function have names, just as variables have names. The names of arguments to a function are called keywords. We can pass arguments to functions by assigning values explicitly to keywords like so:

my_function(keyword_1 = value_1, keyword_2 = value_2,...)

This gives enormous flexibility in using Python interactively. Most functions given in the matplotlib and seaborn libraries have many arguments almost all of have them have default values. However, we often use a few of these keywords and it is quite a pain to remember the exact sequence of arguments in the function signature. Passing argments to keywords allows us to pass arguments in any order convenient to us.


In [ ]:
special_product(1,3) == special_product(3,1)

In [ ]:
special_product(x=1, y=3) == special_product( y=3, x=1)

9.5.1 An application: A function to search for primes

To end this section, below is a function to determine whether a number is prime or not. We use this to refactor our prime listing code from the previous unit.


In [ ]:
import math

def is_prime(p):
    """
    This function determines if p is prime or not. 
    
    Returns:
        bool, True if p is prime. 
    """
    m = int(math.floor(math.sqrt(p)))    
    for d in range(2, m+1):
        if p%d == 0:
            return False
    return True

In [ ]:
for p in range(2, 101):
    if is_prime(p):
        print(p)

10. Lambda expressions


Lambda expressions are used to define short functions that may be written in one line of code. This is more than just a convenience. Most arguments to pandas and seaborn functions are intended to take in callables (functions) and lambda expressions provide a good syntactical way of passing functions as arguments to other functions.

Recall our definition of my_first_function:


In [ ]:
def my_first_function(name):
    print("Hello %s" % (name))

Notice that this function essentially consists of one line, namely the print statement. Using lambda expressions, this can be shortened to:


In [ ]:
printer = lambda name: print("Hello %s" % (name))

We use the lambda keyword to define lambda expressions. After lambda we type in the arguments to the function but without enclosing it in ( ). All arguments must be seperated by commas. Once that is done, type a : and follow it with one line of code which does whatever you want it to do. In this case here, I simply want to print a name. Functionally, this lambda expression is equivalent to my_first_function. However, as you can see below, they are different objects.


In [ ]:
printer

Notice that printer is of class function but is given a name <lambda>. However, we can call printer just as we called my_first_function, by passing arguments to it.


In [ ]:
printer("Joe")

Lambda expressions can take on more than one argument. Here is the function special_product refactored as a lambda expression.


In [ ]:
special_product = lambda x, y: x-y+x*y

special_product(10,9)

Notice that I did not need to put a return to indicate which output to pass to the global environment. This is because lambda expressions are meant to be written in one line, hence it is understood that that one line of code is the output.

10.1 Use cases

Lambda expressions are also known as anonymous functions because we rarely assign lambda expressions to variables. Instead, they are passed directly to keywords or as arguments to most pandas functions or methods. Here is an example to how this is used in a pandas dataframe.

In what follows, we intend to calculate the ratio of sulphates to alchohol content for each sample (row) and assign it as a new column to the data frame. The data frame is displayed below and has been assigned to variable named wine.


In [ ]:
import pandas as pd # Importing the pandas library

wine = pd.read_csv("winequality-red.csv", sep=';')

wine.sample(5)

Here's how this could be achieved. We first define the function that calculates the ratio and then proceed to create the new calculated column.


In [ ]:
def ratio(df): 
    """ This function calculates the ratio of sulphates to alcohol content in the wine dataframe
    
    Returns
        Series, shape (n_samples, ) Array containing the ratio of sulphate to alcohol content for each sample
    """
    ratio_col = df.sulphates/df.alcohol
    return ratio_col

(wine.assign(ratio_sul_to_alc=ratio)
     .head(5))

As you can see, a new column has been added with the calculated column named ratio_sul_to_alc. However, we had to define a function named ratio which we may or may not use again. We would like to achieve the same thing, but without populating the global namespace with unnecessary variables.

So let's do the same thing but with lambda expressions.


In [ ]:
(wine.assign(ratio_sul_to_alc=lambda df: df.sulphates/df.alcohol)
     .head(5))

Notice that they give the same answer. We will learn how to do this in detail in the next unit. For now, the purpose of this example is to illustrate how lambda expressions are a great help in simplifying and making code more compact and readable.

11. Built in functions


Python has some built in functions for coding purposes. While there are quite a few of them, the following two will be used quite often in handling dataframes and visualization. These are

  • zip is a utility function that produces tuples from two lists of equal length.
  • enumerate also creates a tuple in the form $ (i, item_i)$ for $0\leq i < $ len(items).

We will also cover the concept of list comprehension in this section as it is an important programming concept and syntax in Python.

11.1 zip

To understand what zip does, we need to describe a rather simple data structure called tuple. A tuple is like a list, with the difference being that its elements are immutable. Tuples are created by enclosing a list of objects seperated by commas within two round braces ()


In [ ]:
pair = (1,4)

print(pair)

As with lists, tuples can also be indexed and sliced. However, once assigned, individual components of a tuple cannot be changed. For example, the following code will raise and error


In [ ]:
pair[0] = 2

Think of tuples as lists which you wish to protect from changing by accidental assignment. Another way of thinking about tuples are also as constant lists, or as "coordinates" in $\mathbb{R}^n$.

Given two lists with items $$ x_1, \ldots, x_n$$ and $$ y_1, \ldots, y_n$$ zip produces a new list of tuples in the following form $$ (x_1, y_1), \ldots, (x_n, y_n)$$

To understand how zip works, let's try to replicate its function using a for loop.


In [ ]:
zipped = list() # This creates and empty list 

my_colleagues = ['Andy', 'Lisa', 'Dayton']
ages = [29, 24, 50]

for i in range(0,3):
    zipped.append((my_colleagues[i], ages[i]))

print(zipped)

Imagine having to write such a snippet of code every time we need to do something with elements from two lists! As you can imagine, it can cause code to be bloated and distracts from the main logic of the program.

Here's how zip is typically used in a program.


In [ ]:
for tup in zip(my_colleagues, ages):
    name = tup[0]
    age = tup[1]
    print("%s's age is %d" % (name, age))

In fact, we can do even better in terms of readibility. We can utilize what is known as list unpacking to rewrite this for loop.


In [ ]:
for name, age in zip(my_colleagues, ages): # The syntax name, age is what is known as list unpacking
    print("%s's age is %d" % (name, age))

Of course, zip is used in many other context other than to simplify for loops. Can you think of any other situations where you might need to use zip?

11.2 enumerate

As the name of this function suggests, enumerate is useful when we wish to produce a count of items in the list. This is one of the most useful functions you will ever use in Python. It's utility can not be understated.

enumerate works by producing from a list $$ x_0, x_1, \ldots, x_n$$ the following list of tuples (note the 0 indexing) $$(0, x_0), (1, x_1), \ldots, (n, x_n) $$.

We can use list unpacking to capture both the enumerated index and the object itself. You will use enumerate most often in for loops. Below is an example where we wish to assign staff names to staff id numbers based on a running serial number.


In [ ]:
staff_id = dict()

for i, name in enumerate(my_colleagues):
    id_no = 's2017-'+str(i) # The str function coerces and integer i into 'i'
    staff_id[id_no] = name

print("A list of staff id numbers")    
print(staff_id.keys())
print("and the respective staff names")
print(staff_id.values())

11.3 List comprehension

One one the great things about for loops in Python is there are easy to write and understand. However, this comes at a cost: time. for loops in Python are slow. Thus, if the code is iterated over a large number of loops, it will take time.

Let's see this. The scenario here is that we want to assign a running serial number to staff id. This will involve coercing int into str type. The thing is, we have 5000 staff. So let's see how much computer time it takes by using %timeit.


In [ ]:
%%timeit

serial_numbers = list()
for i in range(0,5000): # 5000 staff, so we need 5000 int's
    serial_numbers.append('s'+str(i)) # our serial numbers are prefixed with 's'

Notice that the entire script needed about 1.8 ms to execute. This isn't exactly a short amount of time as far as computers go. Just imagine that we have to do this for 10 times in a row!

Fortunately Python implements what is known as list comprehension which is a way of writing for loops in a more compact and abstracted way. If we think about a for loop to create a list element by element, the code will look something like this:

for i in iterable:

    do code and return result_i
    append(result_i) to list[i]

Python list comprehension provides an alternative syntax which does the exact same thing but faster. The syntax is:

[do code for i in iterable]

The code is enclosed in [ ] because we are using list comprehension to create a list using a set instruction for each item in the iterable (think of iterables as a list). If one is familiar with set notation from calculus courses, list comprehension is syntatically similiar to the following $$\{\, f(x_n) \mid x_n \in A\}$$ where $f$ is some function meant to be evaluated element-wise on each element in a set $A$.

Now let's refactor the for loop above using list comprehension and time the script.


In [ ]:
%%timeit

serial_numbers = ['s'+str(i) for i in range(5000)]

That's an improvement of about 10.2 %!

Let's see another example to really familiarize ourselves with list comprehensions. In the following example, we wish to to extract the first three letters of the months in a year and capitalize them


In [ ]:
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", 
          "November", "December"]

# multiline statements are allowed in Python as long as they are enclosed in some sort of braces. 

short_name = []
mk_list = short_name.append # Here's a neat trick, assign the append method to a variable mk_list.
                            # mk_list is now a function

for month in months:
    mk_list(month[0:3].upper()) # .upper() is a string method that simply capitalizes all letters in a string. 
    
print(short_name)

To refactor this into a list comprehension statement, we first identify the code that is being looped over. That is

mk_list(month[0:3].upper())

However, this composite statement can be broken down into steps:

  1. month[0:3] just extracts the first 3 letters from month

  2. Calling the .upper() method on the string of 3 letters captilizes them.

  3. Calling mk_list is essentially the task of appending the result of the previous two steps to the list short_name.

In list comprehension, the last step is taken care of. Thus, the essential part of the code is month[0:3].upper(). Now we identify the iterable: This is simply the list months (note plural).

What is the variable to indicate the particular month as we iterate over the lists of months? This is simply denoted by the name month (note singular). There is nothing particularly special about using month to name each object in the list months. I could have easily used mon as well. In that case, the essential part of the code which is being looped should be written mon[0:3].upper(). With that clarified, the list comprehension statement is


In [ ]:
short_name = [month[0:3].upper() for month in months]
print(short_name)

12. A concluding demonstration


In this last section, I want to pose a challenge for us to solve.

I want you to create a function which will return the day of the week for a given date input. For example, this function should return THURSDAY for an input of 14-09-2017 (in DD-MM-YYYY) format. The function must be able to accept any date in the past or the future. It must retain its validity even when the date in the past century. Your function signature can be the following:

def weekday_from_date(day=1, month=1, year=2017):
    <code>
    return <weekday as an int or str>

When returning and int to represent a weekday, we use 1=Monday, 2=Tuesday, ...,6=Saturday, 0=Sunday.

I have actually prepared most of the coding already. You will need to just code in one small section to complete this assignment.

How this function works

In order to determine the day of the week from a given date, the most straightforward way is to count the number of days starting from today to the target date. This is complicated only by the fact that months can have 28-31 days.

We first have to determine whether the target date is in the future or past. This tells us whether to add or subtract the day difference to the current week day.

Next of course is to determine the day difference between target date and current date.

An example will serve to illustrate the idea: Days between 15-09-2017 and 18-10-2020 = days between 15-09-2017 to 15-09-2020 + days between 15-09-2020 to 15-10-2020 + days between 15-10-2020 to 18-10-2020.

The correct day of week in terms of its integer code is just the remainder of current day of week $\pm$ day difference divided by 7 since there are seven days in a week.

The assignment

You will need to fill in the empty section for the function month_diff which is meant to calculate the number of days between two dates which differs only by month, e.g. it calculates the number of days between 1st April 2017 to 1st September 2017 and not 4th April 2017 to 7th May 2017.

What code will achieve the correct answer?

You may check the correctness of your code with this website. Run the final function weekday_from_date with a choice of dates as you like and use the website to counter check the returned value. If the answers match, more likely than not, you've succeeded.


In [44]:
from datetime import datetime

DAY_OF_WEEK = {1: "MONDAY", 2: "TUESDAY", 3:"WEDNESDAY", 4:"THURSDAY", 5:"FRIDAY", 6:"SATURDAY", 0:"SUNDAY"}

def todays_date():
    t0 = datetime.today()
    return t0.isoweekday(), t0.day, t0.month, t0.year

# Returns day difference if target date is within same month and year
def day_diff(start_date, end_date): 
    return (end_date[0] - start_date[0])

# Returns day difference if target date may be in differing months but within same year.
# Remember to account for leap years!
def month_diff(start_date, end_date):
    start_month, end_month, end_year = start_date[1], end_date[1], end_date[2]
    total_days = 0
    
    for m in range(min(start_month, end_month), max(start_month, end_month)):
        
        # Enter your answer here
        
        
        
        
        
        # End of answer
        
    # It is quite possible that start_month exceeds end_month. In this case, 
    # we are actually counting days "backwards"! We then have to actually return 
    # the negative value so that this number of days is subtracted from the total. 
    if start_month < end_month:
        return total_days
    else:
        return -1*total_days
    
# Returns day difference across different years
def year_diff(start_date, end_date):
    start_year, end_year = start_date[2], end_date[2]
    total_days = 0
    
    # Adjusting for the fact that in a leap year, the extra day occurs on the last day of Feb. 
    leap_year_adj = 0
    if end_date[1] >= 3 and end_date[2]%4==0:
        leap_year_adj += 1
    if start_date[1] >= 3 and start_date[2]%4==0:
        leap_year_adj += -1
    
    for y in range(start_year, end_year):
        if y%4==0:
            total_days += 366
        else:
            total_days += 365
    return total_days + leap_year_adj

# Returns day of week for given date
def weekday_from_date(day, month, year):
    curr_date = todays_date()
    
    # Checking whether the target_date is in the future (relative to the current date)
    # or not
    conds = [curr_date[3] < year, 
             curr_date[3] == year and curr_date[2] < month,
             curr_date[3] == year and curr_date[2] == month and curr_date[1] < day]
    if any(conds):
        start_date, end_date = curr_date[1:], (day, month, year)
        is_future = True
    else:
        start_date, end_date = (day, month, year), curr_date[1:]
        is_future = False
    
    # Getting the difference in days between the current date and the target date
    number_days = (year_diff(start_date, end_date)
                      + month_diff(start_date, end_date)
                      + day_diff(start_date, end_date))
    
    if is_future:
        target_weekday = curr_date[0] + number_days
    else:
        target_weekday = curr_date[0] - number_days
    
    return DAY_OF_WEEK[target_weekday%7]

In [ ]:
weekday_from_date(15,10,1984)

Looking forward

This course represents just a glimpse of what you can do with Python! There's much much more to learn.

However what you have learnt today will enable you to tackle the challenges of learning how to deal with data using pandas and visualize it using matplotlib or seaborn. Things that we did not cover in this course are:

  • Object Oriented Programming. The class keyword, object inheritance and methods.
  • Iterators and generators. The yield keyword.
  • Decorators and decorator statements. Decorators are used in IPython interactive widgets and web development frameworks.
  • Module and packages.
  • Unit testing.
  • Debugging and code profiling.

The knowledge you have learnt here is enough to enable you to explore these topics even further. All the best!


In [ ]: